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Abstract 

Inspired by online ad allocation, we study online stochastic packing linear programs from theoretical 
and practical standpoints. We first present a near-optimal online algorithm for a general class of packing 
linear programs which model various online resource allocation problems including online variants of 
routing, ad allocations, generalized assignment, and combinatorial auctions. As our main theoretical 
result, we prove that a simple primal-dual training-based algorithm achieves a (1 — o(l))-approximation 
guarantee in the random order stochastic model. This is a significant improvement over logarithmic or 
constant-factor approximations for the adversarial variants of the same problems (e.g. factor 1 — - for 
online ad allocation, and log(m) for online routing). We then focus on the online display ad allocation 
problem and study the efficiency and fairness of various training-based and online allocation algorithms 
on data sets collected from real-life display ad allocation system. Our experimental evaluation confirms 
the effectiveness of training-based primal-dual algorithms on real data sets, and also indicate an intrinsic 
trade-off between fairness and efficiency. 

1 Introduction 

Online stochastic optimization is a central problem in operations research with many applications in dynamic 
resource allocation. In these settings, given a set of resources, demands for the resources arrive online, with 
associated values; given a general prior about the demands, one has to decide whether and how to satisfy 
(i.e., allocate the desired resources to) a demand when it arrives. The goal is to find a valid assignment with 
maximum total value. Such problems appear in many areas including online routing |fT3l l4l, online combi- 
natorial auctions |[T6l . online ad allocation problems |[32l[T8l[T9l . and online dynamic pricing and inventory 
management problems. For example, in routing problems, we are given a network with capacity constraints 
over edges; customers arrive online and bid for a subset of edges (typically a path) in the network, and the 
goal is to assign paths to new customers so as to maximize the total social welfare. Similarly, in online 
combinatorial auctions, bidders arrive online and may bid on a subset of resources; the auctioneer should 
decide whether to sell those resources to the bidder. In the display ads problem, when users visit a website, 
the website publisher has to choose ads to show them so as to maximize the value of the displayed ads. 
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In this paper, we study these online stochastic resource allocation problems from theoretical and practical 
standpoints. Our theoretical results apply to a general set of problems including all those discussed above. 
Our practical results apply to the problem of display ads and give additional validation of our theoretical 
models and results. 

More specifically, we consider the following general class of packing linear programs (PLP): Let J be 
a set of m resources; each resource j 6 J has a capacity Cj. The set of resources and their capacities are 
known in advance. Let / be a set of n agents that arrive one by one online, each with a set of options Oi. 
Each option o G Oi of agent i has an associated value Wi Q > and requires a,i j > units of each resource 
j. The set of options and the values Wi a and ai j arrive together with agent i. When an agent arrives, the 
algorithm has to immediately decide whether to assign the agent and if so, which option to choose. The goal 
is to find a maximum-value allocation that does not allocate more of any resource than is available. 

In the adversarial or worst-case setting, no online algorithm can achieve any non-trivial competitive 
ratio; consider the simple case of one resource with capacity one and two agents. For each agent there are 
just two options, namely to get the resource or not to get it. If an agent gets the resource, he uses its whole 
capacity. The first agent has value 100 for getting the resource and value for not getting the resource. If he 
is assigned the resource, then the value of the second agent for getting the resource is 10000, otherwise it is 
1. In both cases the algorithm achieves less than l/100th of the value of the optimal solution. This example 
can easily be generalized to show that no non-trivial competetive ratio is possible. 

Since in the adversarial setting the lack of prior information about the arrival rate of different types of 
agents implies strong impossibility results, it is natural to consider stochastic settings for online allocation 
problems, where we may have some prior information about the arrival rate of different types of agents. In 
particular, we consider the random-order stochastic model, where the order in which impressions arrive is 
random, but we do not have any other prior information. We present a training-based online algorithm for 
the general class of packing linear programs described above and prove that in the random-order stochastic 
model, it achieves an approximation ratio of 1 — e under some mild assumption^] This result also implies 
the same result in the i.i.d. modeQ 

Our training-based primal-dual algorithm for the stochastic PLP problem observes the first e fraction 
of the input and then solves an LP on this instance. (This requires knowing the number of agents in ad- 
vance, which is unavoidable for any sub-logarithmic approximation; see Theorem[9]) For each resource, the 
corresponding dual variable extracted from this LP serves as a (posted) price per unit of the resource for 
the remaining agents. The algorithm allocates to each remaining agent the option maximizing his utility, 
defined as the difference between the value of an option and the price he must pay to obtain the necessary 
resources. We prove that this algorithm provides a 1 — e approximation for the large class of natural packing 
problems we consider, provided that no individual option for any agent consumes too much of any resource 
or provides too large a fraction of the total value. Specifically we show the following result. Recall that n 
and m, denote the number of agents and resources respectively; q denotes maxj \ Oi\ and OPT the value of 
an optimal off-line solution to the PLP problem. 

Theorem 1. The Training-Based Primal-Dual algorithm is (1 — 0{e))-competitive (a PTASjfor the online 
stochastic PLP problem with high probability, as long as (1) max^ojopY} < / m+ i)t\ nn+ i n \ an d (2) 



'in this context, an "a-approximation" means that with high probability under the randomness in the stochastic model, the 
algorithm achieves at least an a fraction of the value (efficiency) of the offline optimal solution for the actual instance. 

2 In the i.i.d model each impression arrives independently and identically according to a particular but unknown probability 
distribution over the set of possible types of impressions 1 23 1. Our stochastic model captures the i.i.d model. 
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1.1 Applications 

Theorem[T]has many applications; we elaborate on several, including routing problems, online combinatorial 
auctions, the display ad problem, and the adword allocation problem. For each of these problems, we 
improve on the known results for the online version. In each, we will comment on the interpretation of the 
two conditions of Theorem [T] in that application. 

In the online routing problem, we are initially given a network with capacity constraints over the m 
edges. When a customer i £ I arrives online, she wishes to send di units of flow between some vertices Si 
and ti, and derives Wi units of value from sending such flow. Thus, the set of options Oi for customer i is 
the set of all Sj — ti paths in the network. The algorithm must pick a set of customers 7* C I, and satisfy 
their demands by allocating a path to each of them while respecting the capacity constraints on each edge; 
the goal is to maximize the total value of satisfied customers. For this problem, the dual variables learned 
from the sample yield a price for each edge; each customer is allocated the minimum-cost Sj — ti path if its 
cost is no more than Wi. In road networks, for instance, these dual variables can be interpreted as the tolls to 
be charged to prevent congestion. Theorem [j] applies when the contributions of individual agents/vehicles 
to the total objective or to road congestion are small. As one such example, over a million vehicles enter or 
leave Manhattan daily, with the George Washington Bridge alone carrying several hundred thousand. Online 
routing problems have been studied extensively in the adversarial model when demands can be large, and 
there are (poly) -logarithmic lower and upper bounds even for special cases |@1[T3). Our approach gives a 
1 — o(l)-approximation for the described stochastic variants of these problems. 

In the combinatorial auction problem, we are initally given a set J of m goods, with Cj units for each 
good j G J. Agents arrive online, and the options for agent i may include different bundles of goods he 
values differently; option o G Oi provides Wi a units of value, and requires ai Q j units of good j. We wish to 
find a valid allocation maximizing social welfare. Here, the dual variables learned from the sample yield a 
price per unit of each good; each agent picks the option that maximizes his utility. Here Theorem [T] applies 
as long as no individual agent controls a large fraction of the market, and as long as the set of options for 
any single agent is at most exponential in the number of resources. These conditions often hold, as in cases 
when bidders are single-minded or the number of bundles they are interested in is polynomial in n, or if their 
options correspond to using different subsets of the resources. We also observe that the posted prices result 
in a take-it-or-leave it auction, and thus a truthful online allocation mechanism. Revenue maximization in 
online auctions using sequence item pricing has been explored recently in the literature ECHO. lfl6l achieves 
constant-factor approximations for these problems in more general models than we consider. 

In the Display Ads Allocation (DA) problem [19], there is a set J of m advertisers who have paid a 
web publisher for their ads to be shown to visitors to the website. The contract bought by advertiser j 
specifies an integer upper bound on the number n(j) of impressions that j is willing to pay for. A set I 
of impressions arrives online, each impression i with a value Wij > for advertiser j. Each impression 
can be assigned to at most one advertiser, i.e., there are m options for each impression, and each option o 
has ciioj = 1 for advertiser j. The goal is to maximize the value of all the assigned impressions. The dual 
variables learned from the sample yield a discount factor fij for each advertiser j, and the algorithm is to 
assign an impression to advertiser j that maximizes !Oy — fij. The contracts for advertisers typically involve 
thousands of impressions, so the contribution of any one impression/agent is small, and the hypotheses of 
Theorem[T]hold. The adversarial online DA problem was considered in [ 19], which showed that the problem 
was inapproximable without exploiting free disposal; using this property, a simple greedy algorithm is |- 
competitive, which is optimal. When the demand of each advertiser is large, a (1 — ^-competitive algorithm 
exists (see |[T9l for details of the model and results), and this is the best possible. For the unweighted 



3 



(max-cardinality) version of this problem in the i.i.d. model, a 0.67-competitive algorithm has been recently 
developed [20]; this improves the known 1 — ^-approximation algorithm for online stochastic matching [25 ]. 

The AdWords (AW) problem ll32l[T8l is related to the DA problem; here we allocate impressions resulting 
from search queries. Advertiser j has a budget b(j) instead of a bound n(J) on the number of impressions. 
Assigning impression i to advertiser j consumes Wij units of j's budget instead of 1 of the n(j) slots, as in the 
DA problem. Several approximation algorithms have been designed for the offline AW problem Ifl5ll34l l5l. 
For the online setting, if every weight is very small compared to the corresponding budget, there exist 
(1 — ^-competitive online algorithms l32l [T2l l24l l2l. and this factor is tight. In order to go beyond the 
competitive ratio of 1 — - in the adversarial model, stochastic online settings have been studied, such as the 
random order and i.i.d models 1231 . Devanur and Hayes lfT8l described a primal-dual (1 — ^-approximation 
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algorithm for this problem in the random order model, with the assumption that OPT is larger than O(^j-) 
times each Wij, where m is the number of advertisers; Theorem [T] can be viewed as generalizing this result 
to a much larger class of problems. 

1.2 Experimental Validation 

For the applications described above, stochastic models are reasonable as the algorithm often has an idea 
of what agents to expect. For example, in the Display Ad Allocation problem, agents correspond to users 
visiting the website of a publisher who has sold contracts to advertisers. As the publisher most likely sees 
similar user traffic patterns from day to day, he has an idea of the available ad inventory based on historical 
data. In Section [5] we perform preliminary experiments on real instances of the DA problem, using actual 
display ad data for a set of anonymous publishers. As with any real application, there are additional features 
of the problem, and in the one we considered, both fairness and efficiency were important metrics. Hence, 
we also evaluated our algorithms for fairness (see Section [3] for a precise definition); we compared the 
efficiency and fairness of our training-based algorithm with those of algorithms from |[T9l designed for the 
adversarial setting, as well as hybrid algorithms combining the two approaches. We propose a new approach 
for evaluating the fairness of an allocation, based on finding an "ideal" fair allocation, and measuring the 
distance to that allocation. Our experimental results validate Theorem [j] for this application, as they show 
that on this real data set, training indeed helps efficiency by 5-12%, and that the online algorithms from [ 19] 
are significantly better than a simple greedy approach. 

1.3 Other Related Work 

Our proof technique is similar to that of [ 18 ] for the AW problem; it is based on their observation that dual 
variables satisfy the complemtary slackness conditions of the first e fraction of impressions and approxi- 
mately satisfy these conditions on the entire set. However, one key difference is that in the AW problem, 
the coefficients for variable Xij in the linear program are the same in both the constraint and the objective 
function. That is, the contribution an impression makes to an advertisers value is identical to the amount of 
budget it consumes. In contrast, in the general class of packing problems that we study, these coefficients 
are unrelated, which complicates the proof. 

The random-order model has been considered for several problems, often called secretary problems. 
The elements arriving online are often the ground set of an appropriate matroid, and the goal is to find 
a maximum weight independent set in the matroids; such problems include finding a maximum value set 
of k elements (27), or finding a maximum spanning forest in a graph when edges appear online. Other 
secretary problems include finding a maximum weight set of items that fits in a Knapsack. (See [6] for a 
survey of these and other results.) Constant-competitive algorithms are known for these problems; without 
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additional assumptions (such as those of Theorem 1), no algorithm can achieve a competitive ratio better 
than 1/e. Specifically for the DA problem, the results of ll28l imply that the random-order model permits a 
1/8-competitive algorithm even without using the free disposal property or the conditions of Theorem [I] 

There have been recent results regarding ad allocation strategies in display advertising in hybrid settings 
with both contract-based advertisers and spot market advertisers |[22l |2T1 . Our results in this paper may 
be interpreted as a class of representative bidding strategies that can be used on behalf of contract-based 
advertisers competing with the spot market bidders [22]. There are many other interesting problems in 
ad serving systems related to information retrieval and data mining (9l [TTJ [101 as well as various optimal 
caching strategies lf33l fTVl : our focus in this paper is on online allocation problems.] 

It was recently brought to our attention that subsequent to the submission of an earlier version of this pa- 
per (including our main result), similar results (obtained independently) were posted in a working paper[lj. 



2 A Training-based PTAS 



In this section, we present the primal-dual training-based algorithm for the online stochastic packing prob- 
lem, and prove Theorem 1: That is, under mild (practically-motivated) assumptions, the algorithm achieves 
an approximation factor of 1 — e. 

Our algorithm examines the first en agents in order before solving a Linear Program to compute the 
posted prices used for the remaining agents. This requires advance knowledge of the number of agents that 
will arrive; Theorem [9] at the end of this section shows that this is unavoidable. Recall that there is a set / 
of "agents"; agent i E I has a set of mutually exclusive options Oi, and we use an indicator variable Xi to 
denote whether agent i selects alternative o G 0{. Each option for an agent may have a different "size" in 
each constraint; we use ai j to denote the size in constraint j of option o for agent i. We use Wi Q to denote 
the value from selecting option o for agent i, and Cj is the "capacity" of constraint j. That is, our goal is to 
maximize w T x while picking at most one option for each agent, and subject to Ax < c. Subsequently, we 
normalize A, c such that c is the all- 1 's vector, and write the (normalized) primal linear program below. We 
also use the dual linear program, which introduces a variable /3j for each constraint j. 



Primal-LP 

max^ ^2 w io x io 
i oeOi 

x io < 1 (V i) 

oeOi 

^CliojXio < 1 (Vj) 

i,o 

> (Vi,o) 



Dual-LP 

j i 

+ Pjajpj > Wi 
j 

Pj,Zi > 



(Vi, 6) 
(Vi,i) 



Let n be the total number of agents, q = maxj |Oj| be the maximum number of options for any agent, 



and m be the number of constraints. We say that the gain from option o G O, is Wi, 
Training-Based Primal-Dual Algorithm proceeds as follows: 



>:, The 
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1 . Let S denote the first en agents in the sequence. For the purposes purposes of analysis, these agents 
are not selected. (Our implementations may assign these impressions according to some online algo- 
rithm.) 

2. Solve the Dual-LP on the agents in S, with the objective function containing the term e/3j instead 
of f3j for each j G [m\. (This is equivalent to reducing the capacity of a constraint from 1 to e; we 
refer to this as a reduced instance.) Let /3* denote the value of the dual variable for constraint j in this 
optimal solution. 

3. For each subsequent agent i, if there is an option o with non-negative gain, select the option^] o of 
maximum gain, and set zi = gain(o). 

We will refer to a variant of this algorithm in Section[5]as the Dual Base algorithm. The intuition behind 
this algorithm is simple; the dual variables /3j can be thought of as specifying a value/size ratio necessary for 
an option to be selected. An optimal choice for each [3j gives an optimal solution to the packing problem; 
this fact is proven implicitly in the next section, where we further show that with high probability, the 
optimal choice /3j on the sample S leads to a near-optimal solution on the entire instance. In the following, 
let w max = max i)0 {wio}, and let a max = maxj i0 j{a io -,}. 

2. 1 Proof of Theorem H 

We now prove Theorem [T] showing that the above training-based algorithm is a polynomial-time approx- 
imation scheme. ( Proofs of some claims are in Appendix ??.) Let I* C I denote the set of agents i 
with some option o having non-negative gain, and let O* denote the set of pairs {(i,o) \ i G I*,o = 
arg m&x oe0 (i) gain(o)}. We abuse notation by writing i G O* if there exists o G 0(i) such that (i, o) G O*. 
We use 0*(S) to denote £>* n S; note that O* - 0*(S) represents the options selected by the algorithm (for 
the purposes of analysis, we do not select any options for agents in S). 

Given a vector j3*, we obtain a feasible solution to Dual-LP by selecting for each item in /*, the option 
o such that (i, o) G O* and setting z\ = gain(o). 

Definition 2. LetW = X^(io)6C* Wi Q be the total weight of selected options, andletW(S) = )£0*(S) Wi °- 
Let C{j) = E(i,o)eo* a «y and c ti> s ) = E(i, ) 6 o*(5) a *°i- 

For any fixed vector O* and hence W and each C(j) are independent of the choice of the sample S; 
the expected value of W(S) is eW, and that of C(j, S) is eC(j)Q The main idea of the proof is that if /3* 
satisfies the complentary slackness conditions on the first en impressions (being an optimal solution), w.h.p. 
it approximately satisfies these conditions on the entire set. Thus, we conclude that the values of W(S) and 
C(j, S) are likely to be close to their expectations. 

The following lemma proved by ifTHl . an application of the Chernoff-Hoeffding bounds, is of use: 

Lemma 3 ( |[T8l ). Let Y = {Yi, . . . , Y n } be a set of real numbers, and let < e < 1. Let S be a random 
subset ofY of size en and let Yg = Eie5 ^ or an ^ < 5 < l: 



Pr 



\Y S -M[Y S ]\ > l\\Y\\oofo(j} +\\Y\\j2eln(^ 



< 5 



3 Assume for simplicity that there are no ties, and so there is a unique such option. This can be effectively achieved by adding a 
random perturbation to the weights; we omit details from this extended abstract. 

4 Though /3* depends on S, many distinct samples S may lead to the same vector /3* . Also, we take expectations over all choices 
of 5", not just those leading to the given j3* . 
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Definition 4. For a sample S and j G [m], let r,-(5) = \C(j, S) - eC(j)\, and let t(S) = \W(S) - eW\ 
When the context is clear, we will abbreviate Tj(S) by rj and t(S) by t. 

1. The sample S is r^-bad if: 

r j — ( m + l)(lnn + \nq)a ma , x Orel: 

2. The sample S is t-bad if: 



t> (m + l)(lnn + lnq)w m3X + \/W ■ 1 2y/e(m + l)(lnre + lnq)w mt 
Lemma 5. Pr [S is rj-bad] < m .( n( |)m+i far each j, and Pr [S is t-bad] < ^ nq yn+i ■ 

Proof. To prove the first of these results, we simply apply Lemma [3] with Y{ = ai Q j if i G O* and 
otherwise; we use ||Y[|2 < \/||Y||ia max . By setting 6 = m .^m+i , we obtain the desired result. (The 
coefficients are larger than necessary to keep the expression simple.) 

The proof of the second result is essentially identical, and hence omitted. □ 

We argue below that if S is not £-bad or r^-bad for any j, we obtain a good solution. We use the 
following simple proposition: 

Proposition 6. Let j G [m] be a constraint such that C(j, S) = e. If S is not rj-bad, we have 1 — 2e < 
C(j) < l + 3(e + e 2 ). 

Proof. To prove the former inequality, we use C(j, S) — eC(j) < (m + 1) (In ng)a max + s/Cjj) ■ 



2y/e(m + l)(lnng)a max J. As a max < e 3 / ((m + l)Qnnq)), we have e - eC(j) < e 3 + y/C(j) ■ 2s 2 ; 
simple algebra now yields the desired result. The proof of the upper bound is similar, and so omitted. □ 

Lemma 7. If the sample S is not t-bad or r,- -bad for any constraint j, the value of the options selected by 
the algorithm is (1 — 0(e))OPT. 

Proof. Let D = £V (3* + ^ieo* z i b e tne value of the feasible dual solution obtained by setting z\ = 
gain (o) for each (£, o) G O*; by weak duality, D is an upper bound on OPT. We show that £V )eO*-0*{S) w i° — 
(1 — 0{e))D, which completes the proof. 

First, we show that W > (1 — 2e)D. Let J\ denote the set of constraints j G m such that ft* > 0, 
and J2 = [m] — J\ be the set of constraints such that j3* = 0. For each constraint j G J\, complementary 
slackness and Proposition [6] imply that if S is not rj-bad,C(j) > 1 — 2e. 



(i,o)eo* (%,o)eO* \ 3 / iel* j \ (i,o)eO* 

= E z * + E ^c'C?) > E * + E ^K 1 - 2e ) ^ c 1 - 2e ) D 



lO'J 



iei* j ieJ* jeJi 



where the penultimate inequality follows from the fact that for j G J2, /3* = 0, and for each j G Ji, 

C(i) >l-2e. 

Now, the total value obtained by the algorithm is W — W(S) (as the options for agents in S were not 
selected); as S is not i-bad, we have W(S) < eW +(m+l)(lTa.nq)w maiX +2^/W y/ e(m + l)(lnnq) ■ w m&K . 



But we have (m + 1) (In nq)w max < eOPT, and hence W(S) < eW + eOPT + 2 v / H 7 \/e 2 OPT < 0{eW). 
That is, the value obtained by the algorithm is at least (1 — 0(e))W, which is (1 — 0(e))OPT. □ 
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Note that the options selected by the algorithm, as described above, may not be feasible even if S is not 
rj-bad; PropositionJ^jonly implies that C(j) < 1 + 3(e + e 2 ). Thus, we might violate some constraints by a 
small amount. This is easily fixed: simply decrease the capacities of all constraints by a factor of 1 + 0(e). 
This reduces the value of the optimal solution by no more than the same factor, as we can scale down each 
Xi by this factor to obtain a feasible solution with the reduced capacities. Though our algorithm might 
violate the reduced capacity of constraint j by a factor of 1 + O(e), we respect the original capacity when 
S is not rj-bad. Thus, when S is not i-bad or r^-bad for any j, we obtain a feasible solution with value 
(l-0(e))OPT. 

Finally, Lemma [5] implies that for any fixed /?*, the probability that a random sample S of impressions 
is bad is less than ^ nq yn+i ■ The following lemma shows that there are at most (nq) m distinct choices for 
P*; as a result, the sample is good for any f3* with high probability. Therefore, with high probability, our 
algorithm returns a feasible solution with value at least (1 — 0(e))OPT, proving Theorem[T] 

Lemma 8. There are fewer than (nq) m distinct solutions (3* that are returned by the algorithm after step 2. 

Proof. Recall that an optimal (vertex) solution to the Dual-LP on the reduced instance is defined purely by 
the m-dimensional vector /3* . The polytope defined by optimal solutions /?* is defined by the constraints 
of the Dual-LP, projected down to linear inequalities in m dimensions. Since there are at most q such 
constraints for each of the n agents, there are at most possible vertices of the polytope defined by 
optimal solutions (3* . □ 

Theorem 9. Even in the full-information model, where n agents drawn i.i.d. from a known distribution 
arrive online, there is no o(log n/ log log n) -approximation for the online stochastic PLP unless the number 
n of draws from the distribution is known in advance. 

Proof. The intuition behind this proof is simple: The distribution may contain agents with very high value, 
but that arrive with low probability. If there are many draws from the distribution, it is likely that such agents 
will arrive, and so some amount of resources should be reserved for them. On the other hand, an algorithm 
that reserves resources for low-probability events will waste a large fraction of its resources if there are only 
a few draws from the distribution. 

Fix r > 1; consider a problem with 3T log T units of a single resource, and every agent wishing 
precisely 1 unit of this resource. There are T types of agents; agents of type i € {0, ... T — 1} have value 
j>2i f Qr rece i v i n g a uru t Q f resource. The probability of drawing an agent of type i is « ^4r. (Normalize 

these probabilities so they sum up to 1; this changes the probabilities by a factor of -^^y-, which we ignore 
for ease of exposition.) Thus, the distribution of agents is known to the algorithm in advance. 

However, the algorithm does not know how many agents will be drawn from this distribution. Suppose 
the number of draws is 6TlogT • T 2jf , for some j G {0, ... , T — 1}. It is easy to see that there will be 
very likely be more than 3T log T agents of type j, and no agents of type j + 1 or higher. Thus, the optimal 
solution has value 3T log T ■ T 2j ; the hypotheses of Theorem 1 will hold, as no item contributes too much 
to the value of the optimal solution or uses too much of the shared resource. 

Now consider any deterministic algorithm. If, for any k < j, it has selected fewer than 3 log T agents of 
type k after 6T log T ■ T 2k draws, it has a solution of value less than 3 log T ■ T 2k (from agents of type k) 
plus 3T log T • T 2k ~ 2 , which is 3 log T ■ T 2k (l + o(l)); this is roughly a factor of T smaller than the optimal 
solution, which has value 3TlogT • T 2k . Thus, to maintain a o(T) competitive ratio, it must have selected 
at least 3 log T agents of type k after 6T log T ■ T 2k draws, as there may be no subsequent agents. However, 
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this implies that after 6TlogT • T 2 ( T_2 ) draws, the algorithm must have selected at least one agent from 
each of types {0, . . . , (T — 2)}. But there are 31ogT • (T — 1) such agents that must have been selected, 
each using a unit of the resource. Therefore, no more than 3 log T agents of type T — 1 can be selected. But 
if there are 6TlogT • T 2 ( T_1 ) draws, the optimal solution has value 3TlogT • T 2 ^ T ~ l \ and the algorithm 
has value no more than 3 log T ■ T 2 ^ -1 ) (1 + o(l)), which is less by roughly a factor of T less. 

Thus, there is no o(T) competitive algorithm, and the number of draws is at most 0(T 2T ~ 1 log T). That 
is, if n denotes the number of draws, there is no o(logn/loglogn)-competitve algorithm. Using Yao's 
minimax principle, a similar argument can be extended to show that no randomized algorithm can obtain 
good approximations; we omit details from this extended abstract. □ 

3 Display Ad Allocation and Fairness 

Other metrics besides efficiency play an important role in measuring the quality of an allocation. In this 
section, we focus on the Display Ad Allocation (DA) problem. Recall that in the DA problem, a set J of m 
advertisers have paid a website publisher in advance for their ads to be shown to visitors to the website; for 
each advertiser j G J, their contract specifies an upper bound n(j) on the number of impressions they wish 
to pay for. Each agent/impression has a set of m options corresponding to the m advertisers, and must be 
assigned to a single advertiser. If we assign impression i to advertiser j, it occupies one j's n(j) slots, and 
we obtain value wij. 

In addition to the overall efficiency of the allocation, an important consideration is its fairness to the 
various advertisers; An advertiser who does not get his "fair share" of impressions is unlikely to purchase 
further contracts for impressions in the future. Here, we propose a metric to capture the fairness of an 
allocation and present algorithms to compute it. 

Qualitatively, an allocation is "fair" if the advertisers are treated fairly relative to each other. As op- 
posed to efficiency, which is easily quantified as the sum of individual advertiser values, fairness is more 
problematic, as it is inherently a relative (rather than purely additive) measure. One natural option is to 
consider "max-min" fairness, where the goal is to maximize the minimum efficiency among the advertis- 
ers E6l |29l l30l 171 l3l l8l [T4l . While useful in some contexts, in this application max-min fairness gives too 
much attention to the most difficult-to-satisfy advertiser, abandoning overall performance. Given the diver- 
sity of demands, impression targeting criteria and edge weights, a more flexible fairness measure is needed. 
In addition, the total weight of impressions assigned to an advertiser depends not only on the eligible set of 
impressions for that advertiser, but also the competition among advertisers, i.e., if many advertisers are eligi- 
ble for the same set of (high-quality) impressions, none of these advertisers can get all of these impressions, 
and these (high-quality) impressions should be divided in some manner among the eligible advertisers. 

Since this competition is intimately related to the structure of the instance, it is difficult to quantify 
fairness in this context in a universal way; thus, in order to define a fairness measure capturing the above 
aspects, we first define an ideal (offline) fair allocation by taking into account advertisers competing for the 
same set of impressions. We define this allocation algorithmically, i.e., it is a function of the problem in- 
stance. We then compute the fairness of an arbitrary assignment of impressions to advertisers by computing 
the distance of this allocation to this ideal fair allocation. 

More precisely, we define the fairness measure as follows: Given an allocation x-ij of impressions i to 
advertisers j, let vj(x) = w ij x ij f° r eacn j G J denote the value assigned to advertiser j. The Vj(x) 
can be defined for both 0/1 and fractional allocations x in which < X{j < 1. (In a fractional allocation, the 
advertisers "share" the impression, which one could interpret as a random allocation according to the implied 
distribution.) For an allocation x, we roughly define the fairness metric as the l\ distance between x and 
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some ideal allocation x* , but where x is normalized (scaled linearly) so that it has the same efficiency as x* . 
This scaling ensures that x is judged purely based on its relative efficiency among advertisers, rather than on 
absolute efficiency. We scale x to match x* (rather than the other way around) so that we may compare the 
fairness of different allocations with a universal scale. Formally, for an allocation x, let V(x) = YljeJ v o 
We define the, fairness measure f(x) as 



E 



V(x*] 



V(x) 



v j (x)-v j (x*) 



Thus, the smaller f(x) the fairer is allocation x. Now, in order to complete the definition of the fairness 
measure, it remains to define the offline ideal fair allocation x* . 



3.1 Offline Fair Allocations 

In this section, we discuss various natural offline fair allocations x* that can be used in the definition of 
fairness measure defined above. As we discussed earlier, such ideal fair allocation depends on the eligible set 
of impressions, and the set of advertisers competing for the same impressions. Let be the set of eligible 
impressions for advertiser j with demand n(j). Assuming that weights Wij capture the quality /relevance of 
impression i for advertiser j, in an ideal situation, advertiser j would like to get all the n(j) impressions in 
with the maximum weight. In other words, ordering impressions in in the non-increasing order of 
their weight to j, advertiser j would ideally want to get a prefix of n(j) impressions in this order. However, 
it might not be possible for each advertiser j to get a prefix of the first n(j) impressions in his ideal order, 
since an impression i may appear in the prefix of several advertisers. In such situations, we should resolve 
the conflict (competition) of interested advertisers for this impression i in a. fair way, and extend the prefix 
of the affected advertisers. 

Since we allow the offline fair allocation x* to be fractional, this competition may be resolved by sharing 
each impression among all interested advertisers. A natural fair way of sharing an impression % among a set 
J(i) of interested advertisers is to divide this impression % equally among all advertisers in J(i), i.e, each 
advertiser a G J(i) gets a fraction jj^yj of impression i. We call this method the equal sharing method (we 
discuss other sharing methods later.) 

Given an arbitrary sharing policy like the equal sharing policy defined above, we formally define the 
notion of a fair allocation x* in terms of this policy: 

Definition 10. Let H be a sharing policy mapping the advertisers j interested in impression i to a fractional 
allocation {xij}j e j. A fractional allocation x* is fair under H, if 

• for each advertiser j, the set of impressions that j is interested in is a prefix of all impressions ( ordered 
by Wij), 

• the allocation x* represents the policy H applied to each impression, and 

• each advertiser is either interested in all impressions, or is receiving at least n(j) impressions under 
x . 

An alternate way of thinking of a fair allocation is in terms of a game, where each advertiser declares 
a set of impressions they are interested in, and the mechanism then applies H to these declarations. A fair 
allocation is then any Nash equilibrium of this game. 
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We call a fair allocation under equal sharing an equal share allocation. One can compute one such fair 
allocation x* , in an iterative method, as follows: 



Fair Allocation algorithm 

1. Maintain allocation variables {x. t j : i G I,j G J} and prefix "pointers" {p(j) : j G J}. Initialize all 
Xjj = andp(j) = 0. 

2. Until all advertisers are satisfied, i.e., either J2iei x ij ^ n (j) or = n: 

(a) Let j be some unsatisfied advertiser. Increasep(j) by one, and let i be thep(j)-th best impression 
in j's preference order. Also, let J(i) be the set of all advertisers j' for whom i is among the 
p(j')-th best impressions for that advertiser (and note j G J(i)). Set according to H for all 
j G </(«)■ (For example, under equal sharing, we set = pTiTf-) 

Note that there could be many different fair allocations, each with different efficiency. For example 
suppose there were two impressions / = {1, 2}, and two advertisers J = {a, b}, each with capacity one. 
Now suppose wi ta = 100, W2, a = 10, Wi t b = 4, u> 2 & = 6. Then x\^ a = X2 A = ^1,6 = ^2,6 = \ lS a 
fair allocation with value 60; the allocation x\^ a = X2,b = L £±,6 = ^2,a = is also fair and has value 
106. However the following theorem shows that the given algorithm always finds the most efficient fair 
allocation. 

Theorem 11. The Fair Allocation algorithm runs in polynomial time and computes an offline fair allocation 
under any sharing policy where adding an advertiser to the set of interested advertisers does not increase 
the share of any other advertiser. Moreover, for any sharing policy H, this algorithm produces the most 
efficient allocation among all fair allocations under H. 

Proof. In each iteration of the algorithm, one pointer advances, and therefore the number of iterations is 
bounded by the number of edges in the allocation graph, which is polynomial. To see that it produces the 
most efficient allocation under any sharing policy H, we use the following definition: Let x\ and X2 be two 
fair allocations under H, and let Ii(j),l2(j) be the set of impressions advertiser j is interested in for x\ 
and X2 respectively. Now, x\ is said to be shorter than X2 if h(j) ^ hU) for each advertiser j, and the 
containment is strict for some advertiser. 

We show that there exists a unique shortest fair allocation: Let x\ and X2 be fair allocations under H 
such that neither is shorter than the other, and define a new allocation in which each advertiser j is interested 
in impressions = h(j) H hU) (i-e., j requests the shorter prefix from h(j) and h(j))- It is easy to 
see that the number of impressions j receives in the new allocation is at least the minimum of the number it 
receives in x\ and X2, and hence at least n(j)Pl 




Let x* be the unique shortest allocation, and let denote the set of impressions advertiser j is 

interested in. To see that our algorithm returns x*, consider the first step of the algorithm in which p(j) 
moves beyond I* (j ) for any advertiser j : Since each other advertiser has so far requested a set of impressions 
no larger than the set it requests for x* and j receives n(j) impressions under x* , j already receives n(j) 
impressions under our algorithm. Thus, j would not have been unsatisfied and the prefix pointer p(j) would 
not have been incremented, a contradiction. 

Finally, it is easy to verify that for any fair allocations xi, X2, if x\ is shorter than X2, then x\ is at least 
as efficient as X2- This follows from the facts that h(j) is a prefix of I2U) when impressions are ordered 

s This may be less than n(j) if j is interested in all impressions in both x± and X2, but in this case, j is interested in all 
impressions in the new allocation. 




11 



by Wij, and that for each impression in h(j), j receives a share in x\ that is at least as large as it does in 

x 2 . ' □ 



We can describe other variants of this fair allocation by altering how we share an impression among 
those interested in it. One natural way to do this is to divide an impression i among all advertisers in J(i), 
proportional to the weight of impression i for these advertisers, i.e, each advertiser j G J(i) gets a fraction 
^ — — ^ of impression i. We call this sharing method, the proportional sharing method. By a similar 

argument to that of Theorem [TT] we can show that the algorithm runs in polynomial time. Later, we will 
discuss the efficiency of such a fair allocation. 

Inspired by the idea of stable matchings, one can also define an extreme way of sharing an impression i 
among advertisers by introducing a strict preference order for each impression, and giving this impression 
i to an interested advertiser in J(i) with the highest priority in the preference order of impression i. In 
particular, a natural preference order for impression i is to order advertisers in non-increasing order of their 
weight for impression i, i.e, Wij 1 > Wij 2 > . . . , > Wij k . We call this sharing method, the stable-matching 
sharing method. Although this allocation may have some features that do not seem "fair", an advantage of 
this definition is that it achieves approximate efficiency. 

Theorem 12. The efficiency of the stable-matching sharing method is at least \ of the allocation with 
maximum efficiency. Moreover, the efficiency of the equal-sharing and the proportional-sharing method can 
be arbitrarily far from the optimum. 

Proof. First, we observe that the equal- and proportional-sharing methods can result in a fair allocation 
with arbitrarily bad performance. Consider K 2 advertisers; advertiser i has value e < -^2 for impression i. 
In addition, there is one special impression; advertiser 1 has value K for it, and all other advertisers have 
value 1 for it. Every advertiser wants 1 impression. The maximum weight matching gets value at least 
K, by giving the special impression to advertiser 1, and giving every other advertiser i impression i. The 
proportional sharing method implies that for the special impression (everyone's first choice), the total value 
for people who want it is K + (K 2 — 1). As a result, the first advertiser only gets roughly l/K of the special 
impression, and therefore, the fair matching with proportional sharing is not efficient. The same example 
shows that the equal sharing method may also result in an inefficient fair allocation. 

Now, for the stable-matching sharing method, one can verify that the fair allocation in this setting is 
equivalent to a Nash equilibrium of a market sharing game defined as follows: The players are advertisers 
and markets are impressions /. Each player j can play a subset Sj C I of size at most n(j) of impressions, 
and the weight of each impression goes to a player who has this impression in her item set Sj. It is not 
hard to show that this game is a valid-utility game with a submodular social function equal to the weight 
of the corresponding matching in an equilibrium. It follows by a known result of Vetta [35], that the price 
of anarchy of Nash equilibria in these games is |, and this implies that the value of the fair matching with 
stable-matching sharing rule is at least \ of the optimum solution. □ 

Even though, in the worst case, the equal sharing method may result in an arbitrarily inefficient alloca- 
tion, in practice it seems that the efficiency of the equal-sharing allocation is on the same order of magnitude 
as the optimum efficiency (we will show this in our experiments in Section[5]). 

4 Online Heuristic Algorithms 

In this section, we list a set of online competitive algorithms for the display ad allocation problem that we 
will study in our experimental evaluation. Some of these algorithms are already known and analyzed for 
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their theoretical worst-case performance [19], and some are combinations of the algorithms studied in this 
paper. 

All of these algorithms can be described based on the primal and dual linear programming formulations 
for the display ad allocation problem studied in Section |2| In fact, we can interpret these algorithms as 
simultaneous constructing feasible solutions to the primal and dual LPs, using the following outline: 

• For each advertiser j, initialize dual variable j3j to 0. 

• When an impression i arrives online, assign % to the advertiser f £ J that maximizes Wij — (3j. (If 
this value is negative for each j, we may leave impression i unassigned.) Set Xj,/ = 1. 

• If j' previously had n(j') impressions assigned, let i' be the least valuable of these; set xyy = 0. 

• In the dual solution, set Z{ = Wif — (3j> and increase j3j> using an appropriate update rule (see below); 
different update rules give rise to different algorithms/allocations. 

In order to define different variants of this algorithm, we should define the update rule for the dual 
variables. 

1. Greedy Algorithm GREEDY: For each advertiser j, (3j is the weight of the lightest impression among 
the n(j) heaviest impressions currently assigned to j. That is, f3j is the weight of the impression 
which will be discarded if j receives a new high-value impression. An equivalent interpretation of 
this algorithm is to assign each impression to the advertiser with the maximum marginal increase in 
the weight of the matching. 

2. Uniform Average (PD_AVG): For each advertiser j, (3j is the average weight of the n(j) most 
valuable impressions currently assigned to j. If j has fewer than n(j) assigned impressions, (3j is the 
ratio between the total weight of assigned impressions and n(j). 

3. Exponential Weighted Average (PD_EXP) : For each advertiser j, f3j is an "exponentially weighted 
average" of the n(j) most valuable impressions, defined as follows: Let w\, W2, ■ . ■ w n {j) b e tne 
weights of impressions currently assigned to advertiser j, sorted in non-increasing order. 

Let & = n(j>((i + i/l(i))»w-i) £*2 w k y 1 + j^y) ■ 

In the previous paper |[T9l , the authors prove that GREEDY, PD_AVG, and PD_EXP algorithms achieve 
worst-case competitive ratios of i, |, and 1 — - respectively. In this paper, we will compare these online 
algorithms with a training-based algorithm which is based on computing dual variables /3 based on some 
sample data, and then applying these fixed dual variables for the rest of the algorithm. 

We also study a hybrid algorithm, called HYBRID, combining the training -based online algorithm from 
Section [2] and a pure online algorithm. This algorithm is inspired by ideas of Mahdian, Nazerzadeh, and 
Saberi |[3Ti . In this hybrid algorithm, we set /3j for each advertiser j to be a convex combination of two al- 
gorithms: Let @j be the dual variable learnt by the training -based algorithm and remaining fixed throughout 
the algorithm and let /3| be the dual variable as currently used by PD_AVG. We set = a/3j + (l — a)/3| for 
some < a < 1 . Initially we set a = 1 and we decrease a gradually throughout the algorithm until it hits 
0. Thus the algorithm starts using the fixed /3 1 values and gradually switches to the /3 2 values, which in turn 
change as impressions are processed. As we will see in the experimental results, this algorithm outperforms 
both the training-based and the PD_AVG algorithm. 



13 



Publishers 


A 


B C 


D E F 


m 


109 


1117 636 


1586 2585 1113 


n 


5 X 10 5 


4 x 10 s 2 x 10 5 


9 x 10 5 1.5 x 10 6 4 x 10 5 



Table 1: Number of advertisers and number of arriving impressions for each of the six publishers. 



Publishers 


A 


B 


C 


D 


E 


F 


Avg 


LP.WEIGHT 


100 


100 


100 


100 


100 


100 


100 


FAIR 


88.2 


98.4 


73.6 


42.3 


74.6 


53.3 


71.7 


DualBase 


85 


93 


85.7 


74 


91.8 


93.5 


87.2 


HYBRID 


85 


93.8 


95.2 


73.8 


92.7 


93.5 


89 


GREEDY 


64 


90.5 


69.7 


53.6 


55 


86.2 


69.8 


PD AVG 


72 


93.2 


75.3 


65.3 


71.7 


89.5 


77.8 


PD.EXP 


72.6 


89.7 


73.9 


90.8 


72.6 


96.3 


82.6 



Table 2: Normalized efficiency of different algorithms for different publishers and averaged over all pub- 
lishers. All numbers are normalized such that the efficiency of OPT =LP_WEIGHT is 100. 



5 Experimental Evaluation 

In this section, we discuss the experimental results comparing the efficiency and fairness of the algorithms 
discussed in this paper. 

Data Set. Our sample data set consists of (a uniform sample) of a set of arriving impressions and a set of 
advertisers for six different publishers (A-F) over one week. The number of arriving impression varies from 
200,000 to 1,500,000 impressions, and the number of advertisers per publisher varied from 100 to 2,600 
advertisers (see Table[T]). Each impression is tagged with their set of eligible advertisers, and an edge weight 
for each eligible advertiser capturing the "quality score" for assigning this impression to this advertiser. The 
distribution of edge weights approximately follows the log-normal distribution. 

The Algorithms. We examine (a) three pure online algorithms, (b) two training-based online algorithms, 
and (c) two offline algorithms, (a) The pure online algorithms are GREEDY, PD_AVG, and PD_EXP; 
see Section [4] (b) For the training-based online algorithm we use the primal-dual based algorithm from 



Publishers 


A 


B 


C 


D 


E 


F 


Avg 


FAIR 























LP.WEIGHT 


34.6 


47.7 


98.8 


100 


70.3 


90.1 


73.6 


DualBase 


69.5 


62.5 


96.7 


43.1 


87.9 


88.6 


74.7 


HYBRID 


69.4 


63.1 


100 


41.9 


83.7 


88.6 


74.5 


GREEDY 


100 


100 


98.6 


45 


100 


100 


90.6 


PD_AVG 


73 


72 


82.7 


31.7 


91.9 


85.3 


72.8 


PD.EXP 


69.7 


59.5 


86.1 


71 


88.8 


100 


79.2 



Table 3: Normalized fairness of different algorithms for different publishers and averaged over all publishers. 
All numbers of each column are normalized between zero and 100, where is the most fair solution. 
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lp„weight 




fair 




■ Ip-weight 

□ fair 

□ dualbase 

■ pd-avg 



Efficiency 



Figure 1: Efficiency and fairness of algorithms for Publisher B (left). Comparison of efficiency of different 
advertisers for Publisher B (right). Advertisers are sorted by their maximum possible efficiency (given by 
the inverted triangle). 



Section[2| called Dual Base, and the HYBRID algorithm from Section|4] For both of them we construct the 
training data as follows: For each data set, sample 1% of the impressions uniformly and use it for training. 
The remaining 99% of the impressions are used as a test set. With this sampling step we hope to proxy 
the random order model, since in the random order model a sample of the whole data is equivalent to a 
sample from the beginning part of the sequence, (c) As offline algorithms we use the fair algorithm using 
equal sharing, called FAIR and described in Section|3j and the algorithm LP_WEIGHT, which computes the 
optimal efficient assignment (i.e. the maximum weight b-matching). The latter is computed by solving the 
primal LP using the GLPK LP solver. 





lp_weight 




pd-exp 


greedy 






hybricdualbase 


pd- 


-avg 


fair 





■ Ip-weight 

□ fair 

□ dualbase 

■ pd-avg 




Mud 



Efficiency 



Figure 2: Efficiency and fairness of algorithms for Publisher D (left). Comparison of efficiency of different 
advertisers for Publisher D (right). Advertisers are sorted by their maximum possible efficiency (given by 
the inverted triangle). 

Experimental Results. The efficiency and (normalized) fairness of the output of each of the algorithms 
are summarized in Tables [2] and [3] The results for three representative publishers are additionally depicted 
in Figures [T] [3] and [2] Recall that we normalized efficiency so that the efficiency-optimal algorithm 
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pd-exp 




pd-avg 




fair 





■ Ip-weight 

□ fair 

□ dualbase 

■ pd-avg 




Efficiency 



Figure 3: Efficiency and fairness of algorithms for Publisher C (left). Comparison of efficiency of different 
advertisers for Publisher C (right). Advertisers are sorted by their maximum possible efficiency (given by 
the inverted triangle). 

LP_WEIGHT has efficiency 100. Table|2]shows that (1) the training-based algorithms clearly outperform the 
pure online algorithms, (2) of the pure online algorithms, both PD_AVG and PD_EXP outperform GREEDY, 
and (3) HYBRID and DualBase perform very similarly, except for one publisher where HYBRID clearly 
outperforms DualBase. 

Table [3] shows normalized fairness. Since the value of fairness depends on the values assigned to ad- 
vertisers and different publishers have different advertisers, we normalized the fairness values for each 
publisher so that the least fair algorithm achieves a score of 100 and algorithm FAIR achieves a score of 0. 
Normalizing allows us to compute the average over different publishers. The results in the table indicate that 
GREEDY is the least fair algorithm. The remaining algorithms, including LP_WEIGHT, perform roughly 
the same, though their performance differs over different publishers. 

Figures[I|-[3]plot efficiency vs. (unnormalized) fairness and they show additionally the efficiency achieved 
for the top 10 advertisers for four of the algorithms. The inverted triangle above each advertiser represents 
the maximum possible efficiency for this advertiser if the other advertisers did not exist. There are three 
rough categories and the publishers for which we show this data each represent a different category: For 
publisher B in Figure [T] the maximum possible efficiency of the top advertisers is almost the same as the 
efficiency achieved by all algorithms. This publisher is undersold with little competition between the ad- 
vertisers. Thus, for this publisher, the choice of algorithm does not heavily influence efficiency. Table [2] 
shows that for publisher B all algorithms, including FAIR, achieve an efficiency of 90 or above. The situ- 
ation is similar for publisher A (not shown). In both settings FAIR has an impressively high efficiency and 
LP.WEIGHT achieves a good fairness value. In such a low-competitive situation the online algorithms are 
in a clear disadvantage over the offline algorithms. Also the training-based online algorithms outperform 
the pure online algorithms as they can leverage their knowledge about the data to construct a more efficient 
and more fair solution. 

Publisher D in Figure [2] shows the other extreme: Here the maximum possible efficiency of the top 
advertisers is much larger than the efficiency achieved by any of the algorithms, including the optimum 
LP.WEIGHT. This publisher has a lot of competition between the advertisers. Publisher F (not shown) 
is in a similar, but a bit less extreme situation. In both cases, the choice of an algorithm has a large in- 
fluence on the efficiency, as can be seen in Table [2| Algorithm FAIR distributes the weight more evenly 
across the advertisers than any of the other algorithms, but also achieves only an efficiency of about 42, 
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resp. 53. Algorithm LP.WEIGHT, on the other side, generates a very uneven distribution of weights, giving 
a lot of efficiency to advertiser 1 and 8. For both publishers PD_EXP clearly outperforms the non-optimal 
algorithms. PD_EXP also has better theoretical performance. 

Finally publishers C (in Figure[3]) and E (not shown) represent the "in-between" situation: The maximum 
possible efficiency of the top advertisers is somewhat larger than the efficiency achieved by the algorithms, 
but there is not a large gap. In both cases the training-based algorithms clearly outperform the pure online 
algorithms in efficiency. Thus, this is the situation where learning clearly helps in terms of efficiency. 

Overall we draw the following conclusions: 

Algorithm PD_AVG generally achieves much better efficiency and fairness than GREEDY, even though 
both algorithms are ^-competitive in the worst case. Algorithm PD_AVG also results in the best fair solution 
among all algorithms and GREEDY has the worst fairness measure. 

The training-based algorithms generally achieve higher efficiency than the pure online algorithms, es- 
pecially in settings that are not too extreme, i.e., oversold or undersold. On average, DualBase improves 
12% over PD_AVG, and 5% over PD EXP. Furthermore, HYBRID has a marginal improvement (of 2% on 
average, and upto 10%) over DualBase, mostly based on a big improvement for one publisher. 

Though the worst-case competitive analysis of PD EXP is much better than PD J\VG, this algorithm 
showed only 5% overall improvement over PD_AVG, and in one case showed a significant loss in efficiency. 
However, in highly competitive settings, PD EXP gives large improvements. 

6 Concluding Remarks 

In this paper, we give a training-based algorithm for online allocation, and prove that in the random-order 
stochastic model, it achieves a (1 — e) approximation to the optimal solution under mild assumptions. 

We also considered the Display Ad Allocation problem from both a theoretical and empirical perspec- 
tive, studying fairness in addition to efficiency. We introduced different notions of offline fair allocations, 
and present a new fairness measure as a distance to such offline fair allocations. Finally, we performed an 
experimental evaluation of our training-based algorithm, along with previously studied online algorithms 
and some hybrid algorithms. We compared their performance on data sets from real display ad allocation 
problems; our experiments show that among the pure online algorithms designed for worst-case inputs, 
PD_AVG performs reasonably well in terms of both efficiency and fairness, and PD_EXP gives large im- 
provements for more difficult instances. The training-based algorithm outperforms PD_AVG and PD_EXP 
by a large factor, and combining pure online and training-based methods in a hybrid algorithm improves the 
efficiency further. 

This paper motivates many open problems to explore: (i) Can we achieve an algorithm that is simulta- 
neously good both in the worst case and in stochastic settings? Such an algorithm would be of use when 
the actual distribution of agents is different from the one predicted/learnt from a sample; in the display ad 
setting, this occurs when there is a sudden spike in traffic to a website, perhaps in response to a breaking 
news event, or links from an extremely high-traffic source, (ii) Can we design an online allocation algorithm 
that provably achieves approximate efficiency and approximate fairness (for some an appropriate notion of 
fairness) at the same time? (iii) Can we prove that in certain settings that appear in practice, the PD_AVG 
algorithm achieves an improved approximation factor (i.e., better than |)? (iv) Can we extend the online 
stochastic algorithm studied in this paper to other stochastic process models such as Markov-based stochas- 
tic models? Answering these questions is an interesting subject of future research. 
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